Enforce per-tool timeouts and enhanced circuit breaker plugin#2569
Merged
crivetimihai merged 1 commit intomainfrom Feb 1, 2026
Merged
Enforce per-tool timeouts and enhanced circuit breaker plugin#2569crivetimihai merged 1 commit intomainfrom
crivetimihai merged 1 commit intomainfrom
Conversation
f09e918 to
2285ea2
Compare
Member
Code reviewFound 1 issue:
mcp-context-forge/mcpgateway/services/tool_service.py Lines 2880 to 2884 in 8fbd02d The counter at line 2882 uses tool_timeout_counter.labels(tool_name=name).inc()But the log at line 2870 uses message=f"REST tool invocation timed out: {tool_name_computed}",This same inconsistency appears in the SSE and StreamableHTTP timeout handlers (lines 3147 and 3291 use |
2285ea2 to
3ef8dbb
Compare
Open
7 tasks
3ef8dbb to
5efaabb
Compare
Member
Code Review Changes AppliedRebased onto High Severity Fixes
Medium Severity Fixes
Low Severity Fixes
New Tests Added
Summary of ChangesAll 48 tests pass. Ready for merge. |
Implement strict per-tool timeout enforcement for all transports (REST, SSE, StreamableHTTP, A2A) and enhance the CircuitBreakerPlugin with half-open states, retry headers, and granular configuration. Changes: - Wrap all tool invocations in asyncio.wait_for with effective_timeout - Add per-tool timeout_ms support (ms to seconds conversion) - Add half-open state for circuit breaker recovery testing - Add half_open_in_flight flag to prevent concurrent probe requests - Add retry_after_seconds in violation response for rate limiting - Add tool_timeout_total and circuit_breaker_open_total Prometheus metrics - Add cb_timeout_failure context flag for timeout detection in plugins - Add tool_overrides for per-tool circuit breaker configuration - Handle both asyncio.TimeoutError and httpx.TimeoutException - Log actual elapsed time instead of configured timeout Fixes applied during review: - Fix _is_error() to detect camelCase isError from model_dump(by_alias=True) - Fix half-open probe guard: only check when st.half_open is True - Add stale-probe timeout to prevent permanent wedge if plugin blocks - Add timeout enforcement to A2A tool invocations - Call tool_post_invoke on exceptions so circuit breaker tracks failures - Add ToolTimeoutError subclass to distinguish timeouts from other errors - Only skip post_invoke for ToolTimeoutError (not all ToolInvocationError) - Set error_message and span attributes for ToolTimeoutError observability - Update README to document isError camelCase support Timeout precedence: 1. Per-tool timeout_ms (if set and non-zero) 2. Global TOOL_TIMEOUT setting (default: 60s) Closes #2078 Co-authored-by: Keval Mahajan <[email protected]> Signed-off-by: Mihai Criveti <[email protected]>
5efaabb to
dcde01e
Compare
crivetimihai
approved these changes
Feb 1, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
✨ Feature / Enhancement PR
🔗 Epic / Issue
Closes #2078
🚀 Summary (1-2 sentences)
This PR implements strict per-tool timeout enforcement for all transports (REST, SSE, StreamableHTTP) and significantly enhances the
CircuitBreakerPluginwith half-open states, retry headers, and granular configuration overrides.All failure logic is decoupled via the Plugin System:
ToolService: Executes requests, enforces strict limits (Timeouts), reports status.
CircuitBreakerPlugin: Decides if a request should be attempted based on history.
Configurations for timeout:
# Env Variable: TOOL_TIMEOUTTool Schema:Example Precedence Calculation:
🧪 Checks
make lintpassesmake testpasses📓 Notes (optional)
Key Changes
asyncio.wait_for.timeout_msoverrides global settings.plugins/circuit_breaker.py):retry_after_secondsin violation response.cb_timeout_failurecontext flag.tool_timeout_totalandcircuit_breaker_open_totalPrometheus counters.flowchart TD Client -->|Invoke Tool| Gateway(ToolService) Gateway -->|1. check| CB[CircuitBreakerPlugin] CB -- Open? --> Reject[429 / Blocked] CB -- Closed? --> Gateway Gateway -->|2. Invoke w/ Timeout| Tool[External Tool] Tool -- Success --> Gateway Gateway -->|3. Record Success| CB Tool -- Timeout/Error --> Gateway Gateway -- Flag: cb_timeout_failure --> CB Gateway -->|4. Record Failure| CB CB -- Trip Threshold? --> CBState[Open Circuit]Feature Map & Implementation Details:
asyncio.wait_for. Prioritizes tool-specifictimeout_msover global default.TimeoutErrorand sets context flag. Plugin reads flag and counts as failure even without HTTP status.open_until - nowand returns exact seconds in violation details.tool_overridesmap allows custom thresholds for critical tools (e.g., higher patience for Payment APIs).tool_timeout_totalandcircuit_breaker_open_totalPrometheus counters.Verification
Unit Tests: Added TestToolTimeoutsAndRetries covering state transitions, timeout precedence, and plugin logic.